Boosting SARS-CoV-2 detection combining pooling and multiplex strategies

RT-qPCR is the gold standard technique available for SARS-CoV-2 detection. However, the long test run time and costs associated with this type of molecular testing are a challenge in a pandemic scenario. Due to high testing demand, especially for monitoring highly vaccinated populations facing the emergence of new SARS-CoV-2 variants, strategies that allow the increase in testing capacity and cost savings are needed. We evaluated a RT-qPCR pooling strategy either as a simplex and multiplex assay, as well as performed in-silico statistical modeling analysis validated with specimen samples obtained from a mass testing program of Industry Federation of the State of Rio de Janeiro (Brazil). Although the sensitivity reduction in samples pooled with 32 individuals in a simplex assay was observed, the high-test sensitivity was maintained even when 16 and 8 samples were pooled. This data was validated with the results obtained in our mass testing program with a cost saving of 51.5% already considering the expenditures with pool sampling that were analyzed individually. We also demonstrated that the pooling approach using 4 or 8 samples tested with a triplex combination in RT-qPCR is feasible to be applied without sensitivity loss, mainly combining Nucleocapsid (N) and Envelope (E) gene targets. Our data shows that the combination of pooling in a RT-qPCR multiplex assay could strongly contribute to mass testing programs with high-cost savings and low-reagent consumption while maintaining test sensitivity. In addition, the test capacity is predicted to be considerably increased which is fundamental for the control of the virus spread in the actual pandemic scenario.

www.nature.com/scientificreports/ Pharmaceutical treatment options are still limited and measures that maintain viral vigilance, such contact tracing and quickly diagnostic and isolation of confirmed cases, remain key factors to control the spread of SARS-CoV-2 4 . In this context, testing is essential for transmission control and the maintenance of normal activities after lockdown especially if considering the asymptomatic cases that usually are not tested in the standard public health surveillance.
Mass testing is a critical strategic approach for epidemic control, pushing the global demand for diagnosis tests exceeding the supply capacity 5 . The current wave of Omicron variant exemplifies the problem of test access and availability. Several regions are facing shortage of SARS-CoV-2 diagnostic tests, which will negatively impact on the viral spread control since isolation of positive cases will be less efficient. Mid and low-income countries have always tested at lower rates when compared to high income countries, for instance, up to October 22, 2021, the United Kingdom present a total of 8,641,225 cumulative cases and performed, until the 11th day, 4,558,362 diagnostic tests per million inhabitants while United States and Brazil with 44,940,696 and 21,680,488 cumulative cases, respectively, tested 1,971,408 and 297,348 per million inhabitants 2,6 .
Virus detection in human respiratory tract samples is the standard diagnosis of ongoing acute infections, which represents a reliable approach to manage virus transmission 7 . WHO recommends quantitative Reverse-Transcription Polymerase Chain Reaction (RT-qPCR) as the standard method for molecular SARS-CoV-2 detection due to its high sensibility and specificity 8 . Frequently, the protocols used for SARS-CoV-2 diagnosis includes the detection of two viral targets and one human internal control performed either in single reactions or in a multiplex format to each patient tested to increase the reliability of the results 8,9 .
Pooling methodologies are based on the combination of multiple specimens in single sample analysis 10,11 and could be applied for screening large numbers of individuals during diagnosis routine, being less expensive and time-consuming than individual testing 12 . Another strategy that could be used for diagnostic optimization is the multiplex assay, which aims to reduce the RT-qPCR reactions needed performing one reaction for both viral targets and the human internal control.
Furthermore, both methodologies (pooling test and multiplex assay) could help more countries to perform mass testing; however, the combination of both strategies has not been fully explored, even though the cost savings can be truly reached. The loss of sensitivity caused by dilution of samples, the choice of a combination of targets and probes for multiplex assay and the increase of false-negative results are some problems that need to be considered if both methodologies would be employed.
In this work, we describe a detailed optimization of pooling test protocol and a combination of pooling and multiplex assay to offer an economically viable approach for reliable massive diagnosis services according to the prevalence of positive cases on the evaluated population.

Results
Evaluation of pooling samples with isolated SARS-CoV-2. Serial dilution curves of a SARS-CoV-2 viral stock showed a low detection limit for both N1 and N2 targets (0.001 and 0.01 infectious viral particles/ mL, respectively) (Fig. S1A). We measured total viral genomic RNA in the titrated viral stock and calculated a total of 10 3 less infectious particles than the total viral particles present, thus the detection limit of this test is 1 RNA copy. Mixtures of SARS-CoV-2 titrated stocks and SARS-CoV-2 negative VTM pools in different proportions (1:0; 1:4; 1:8; 1:12; 1:16) were tested. Although C t values increased when viral stocks were diluted in VTM, when compared to the undiluted virus, the accuracy in detecting SARS-CoV-2 targets was maintained. At least 10 infectious virus particles were still detected with pooled viral stocks (Fig. S1B,C).

Evaluation of pooling samples with clinical samples.
Pooling strategy with up to 32 samples in a single pool showed that SARS-CoV-2 detection, considering the N1 gene, was highly sensitive in samples with individual C t values below 29, with all tested samples being positive according to the CDC criteria ( Fig. 1) 7 . However, for lower viral loads, samples with individual C t higher than 29, generally observed in patients at the beginning and the end of the infection process, poor results were found for pools consisting of 16 or 32 specimens. Detection of the N1 gene was possible in 86% of samples for the 1:16 pool and 57% of samples for the 1:32 pool (Fig. 1A). The sensitivity of the test also decreased for the N2 gene, with detection possible in 86% of samples for the 1:8 pool, 57% for the 1:16 pool, and 43% for the 1:32 pool (Fig. 1B). It was observed that samples with C t values higher than 34 reduced the test sensitivity when 16 or 32 patients were pooled (Fig. 1). Absolute C t values for all the sample pools and RNA pools tested are summarized in Tables S1 and S2, respectively. PCA analysis performed with pooled VTM samples and pooled RNA samples did not detect any group pattern amongst them ( Fig. S2; PC-1 = 98% and PC-2 = 1%; Tables S1 and S2), which means that no difference was observed between both pooling approaches. Negative and positive controls, as well as the C t curve from pools using only negative patient samples, were validated in our analyses (Fig. S3).
In silico pooling analyses. To assess the advantages of the pooling approach, we used previous RT-qPCR results obtained in the diagnostic analyses performed with industrial workers of Rio de Janeiro state as a base to calculate the prevalence rates (%) of positive cases and to build the statistical modeling methodology. According to the in-silico methodology established, it was possible to construct a matrix evaluating the cost-savings for each pool size given positive cases prevalence. Based on this matrix, it is possible to suggest ideal pool sizes according to the prevalence rates and the cost-saving percentages on any given population (Table 1).
This mathematical modeling shows that populations with prevalence rates as low as 1% may reduce costs up to 80% using up to 8 or 12 specimens per pool. However, as the prevalence rate increases, the cost saving is drastically reduced in pools with a large number of samples. The need to process single analysis from the pool to identify the positive individuals increases the overall cost. Considering prevalence values equal to 5, 7.5, and www.nature.com/scientificreports/ 10%, the best results were observed for 1:4 pooling, with an economy of 57, 48, and 41%. As such, for prevalence rates higher than 1% but lower than 10%, pooling sizes of 4 and 8 return better cost savings in comparison to larger pool sizes. The cost modeling also demonstrates that as both pool sizes and population positive prevalence increases, the savings become marginally lower until they surpass the value of a single test, limiting the optimal cost savings to a well-defined bounded range (Fig. 2).

Validation of pooling strategies with clinical samples. Previous data of the COVID-19 diagnostic
performed with industrial workers showed a prevalence of 7.8% positive cases in the evaluated population. According to the in-silico analysis performed, pooling using four samples was the best choice for cost optimization (Table 1). To assess the precision of the in-silico model, the mass testing program for industrial workers of Rio de Janeiro State was tested using the pooling strategy. A total of 6096 samples were processed at the SESI Innovation Center for Occupational Health, constructing 1524 pools ( Table 2). From those pools, 365 were positive (24.0%), which resulted in more 1460 RT-qPCR tests to identify the positive samples. Overall, an economy of 51.1% was observed using this strategy ( Table 2). This result agrees with the statistical modeling prediction performed ( Table 1).

Development of a multiplex qPCR-based approach for the SARS-CoV-2 diagnosis in pooling samples.
Since our results with pooling samples maintained the sensitivity when using the singleplex CDC assay, we decided to evaluate if this strategy could maintain sensitivity while using a multiplex assay. When we analyze the efficiency of the multiplex RT-qPCR, by itself, the efficiency of each target detected alone (Fig. 3A,B and Table 3) was equivalent to the efficiency of detecting two viral targets in combination ( Fig. 3C and Table 3) and two viral targets and the RNase P together (Fig. 3D,E and Table 3).  www.nature.com/scientificreports/ The N-FAM/ E-Cy5/ RNaseP-Hex combination gave the best results for RNaseP detection when compared to the N-FAM/ Orf1ab-Cy5/ RNaseP-Hex in the 1:8 proportion ( Fig. 3D and Fig. S4E; Table S3). Therefore, a multiplex using the N-FAM/ E-Cy5/ RNaseP-Hex combination performs well in a pooling of both 4 and 8 samples/pool (Fig. 3C,D).

Validation of N-FAM/E-Cy5/RNaseP-Hex multiplex alone and combined with pooling strategy
for clinical samples. The N-FAM/E-Cy5/RNaseP-Hex multiplex was the most efficient combination in the previous assays, therefore was chosen to be used in clinical samples. The sensitivity of the multiplex-pooling strategy was evaluated with 38 individual samples ( Fig. 4; Table 3). Previously, we quantified these samples using the singleplex strategy recommended by the CDC 8 (Fig. 4A). The triplex results showed that it is possible to detect SARS-CoV-2 even in samples with C t values higher than 30 ( Fig. 4B; Table S3). Considering the CDC criteria, for positive samples with C t s below 40 in both viral targets, the sensitivity of the triplex with the N-FAM/E-Cy5 /RNaseP-Hex combination was 84.2% (32/38) ( Fig. 4B; Table S3). Pools of four and eight samples gave surprisingly the same sensitivity (89.47%) considering all C t ranges (Fig. 4C,D; Table S3). Looking specifically at the C t values up to 26, the multiplex pooling strategy had a sensitivity of 96.66% (Fig. 4C,D; Table S3). A total of 38 samples pooled gave positive signals for SARS-CoV-2 even when using an 8-specimen pool, without considerable sensitivity loss in the same C t range (Fig. 4B,D; Table S3). Our results demonstrated that multiplex pooling of up to eight samples using N-FAM/E-Cy5/RNaseP-Hex gives the best results in samples with C t values up to 31 (Fig. 4B,D; Table S3).

Figure 2.
Pooling test savings given pool size and prevalence. Cost savings surface depicting optimal savings crests at pool sizes around 4 and 8. For populations with prevalence around 1%, many pool sizes are profitable, but as prevalence increases, costs savings are drastically reduced. There are some options available for SARS-CoV-2 diagnostics that rely on the detection of viral proteins (antigen rapid tests) or viral genome (RT-qPCR). However, RT-qPCR is still the gold standard tool due to its high sensitivity and specificity. But for testing large populations, it becomes a limited technique because of the vast number of tests required and the deficit of reagents, equipment, and consumables in diagnostic laboratories around the world 14,15 .
Dorfman designed the strategy of pooling samples in 1940 to screen for syphilis infection in large populations of soldiers 16 . Since then, pooling is widely used for the detection of other pathogens for diagnostic purposes and even to study the prevalence of these agents in a defined population 10,11,17,18 . In a pandemic situation, as for COVID-19, pooling samples may be an interesting approach to overcome limitations related to large-scale  www.nature.com/scientificreports/ diagnosis and provide access to mass testing in order to provide multi-time point surveillance and to define the prevalence rates in a certain region 19,20 .
One of the main concerns of pooling is the size of assembled pools that should be evaluated according to the prevalence of the pathogen in the study population. Some studies have demonstrated that for diseases with low population prevalence, this approach has the most potential for enabling mass testing at low costs, including for SARS-CoV-2 detection 12,21 . Besides, it was seen a great potential of pooling for repeat testing of the same population on a consecutive period, which could be an efficient strategy for disease control 18 . However, the prediction of ideal pool size (Table 1) requires a discerning in silico analysis otherwise the cost-saving will not be reached.
Mathematically, applying the Dorfman's approaches may incur savings as high as 90% within populations with a prevalence close to 1%. Our study adopted the statistical modeling approach and validated the data with pooling biological samples for COVID-19 diagnostic, confirming that the pool size must be selected according to the prevalence rate of positive cases in the population (Fig. 2). Besides, this combined analysis is essential to allow the optimization of limited resources and to apply mass testing enabling the management and reduction of underreporting, observed globally, but especially in large and developing countries 22,23 . For SARS-CoV-2 detection, other published studies performed pooling test validations based on mathematical models. Abdalhamid et al. used a web-based application to calculate pool size and reach similar results to ours with a recommended pool size of five samples for a 5% prevalence rate and an economy of 57% in tests 12 .
Although Yelin et al. have demonstrated the possibility of making pools using RNA or swab samples with the same test quality, the authors used a limited number of samples 24 . Here, our results showed that pooling RNA or nasopharyngeal swab samples have the same efficiency for SARS-CoV-2 diagnostic, which chooses pooling nasopharyngeal swab samples to overcome the RNA extraction bottleneck. Another critical factor is sample dilution that might be responsible for inconclusive or even false-negative results, depending on the C t range of the analyzed samples. In this work, we thoroughly tested different concentrations of viral stock preparations and clinical samples in pools up to 32 samples. We demonstrated that even at the concentrations close to the RT-qPCR detection limit (pools up to 8 samples) were still positive. Larger pools could also lead to problems related to samples with low viral load, whose detection could be missed, and the logistics of assembly and deconvolution of these pools needs to be done carefully to avoid cross-contamination. A study proposes the deconvolution of pools divided into stages according to prevalence rate could optimize the process and increase the samples pooled 25 . Nevertheless, adding more steps to the diagnostic chain could delay the final diagnostic and spend more resources.
Our study associates in silico analyses and test validation to ensure a safe methodology to be widely used, and one that will reduce false and inconclusive diagnostic results and save costs. We performed the implementation of pool methodology in the COVID-19 mass testing program of the Industry Federation of the State of Rio de Janeiro. We observed substantial gains by reducing the qRT-PCR run time and the use of reagents and www.nature.com/scientificreports/ general consumables ( Table 2). The statistical model predicted a cost saving of 48% (Table 1) for the pooling of four nasopharyngeal samples in a population of 7.5% of the prevalence rate. Indeed, we are observing an economy of 51.1% in our test routine. This cost saving already considers all the subsequent re-testing for the diagnosis of individual samples and the false-negative rate. In our analysis, the false-positive rate was 15.28%, meaning that 57 of 373 pools that presented a CT curve were negative when samples were analyzed individually. This rate is higher than previously reported by other authors 26,27 but could be explained by factors that could impact on sample's quality and stability, since it has already been demonstrated that oro/nasopharyngeal sample's freeze-thaw and time of storage since sample collection reduce viral load [28][29][30] . In our case, samples were stored at − 80 °C within 4 h of initial handling and thaw for individual testing. Also, factors associated with mixing of several nasopharyngeal samples within a reaction tube which is diluted out when testing the sample individually, could be considered. For instance, we established a stringent parameter for the pooling test interpretation, for which any positive signal regardless of the C t and the fluorescence signal was considered positive, even those that would be considered inconclusive in individual tests. Therefore, excluding results with C t values above 34 in the pooling test, the rate drops to 5.19%, it is within the range of other studies 26,27 . Given that the test cost of COVID-19 diagnostic in-house PCR used in our company is around US$22.20 per individual sample, pooling samples by four reduces the experimental cost to US$10.85 per test. This reduction of approximately twice was also observed in another study for a prevalence of 5% 26 . The pooling strategy is already fairly established for COVID-19 mass testing in some countries such as Germany and India. Our study shows that combining this strategy with in-silico analyses will improve its use. To increase the added value, the combination of sample pooling strategy and a multiplex RT-qPCR was predicted to promote an economy of 47.29 and 25.63% for pools of 4 and 8 samples respectively, according to the model predicted for a COVID-19 prevalence rate of 7.8%.
Ishige et al. described that the multiplex N + E is a good combination when evaluated with the internal control hABL1 31 . Another study corroborated this result using 27 samples in blind testing 32 . They compared singleplex reactions with multiplexes using hRNase P as an internal control. Among them, 12 samples (44.4% of positive rate) were positives in singleplex configuration and 11 in the multiplex, with no sensitivity loss 32 . Another report developed a multiplex assay using the CDC N1 and N2 primers modified with different probes and demonstrated a limit of detection of 50 copies/reaction 33 . This study also analyzed clinical samples and the multiplex sensitivity was conserved when compared to CDC singleplex. However, as previously reported 34 , the N2 primer showed a reduced sensitivity. In this way, a multiplex assay that uses two distinct viral targets would be useful. Despite these promissory findings, we decide to evaluate three possible multiplex combinations, associating with sample pooling strategy. We demonstrated that the pooling approach using 4 or 8 samples tested with a multiplex combination in RT-qPCR is feasible to be applied without sensitivity loss, mainly combining Nucleocapsid (N) and Envelope (E) genes (Fig. 4B) while for the other combinations of target genes evaluated, the sensitivity was lower (Table S3).
A recent study that combined pooling samples and multiplex used only a pool of four samples and the multiplex was performed using the CDC primers only for viral N gene and human RNaseP. Despite this difference compared to our study, it was also observed a high concordance between singleplex and multiplex results with samples with higher C t values presenting discordant results 35 .
In summary, our study demonstrates that the implementation of pooling strategy combined with a multiplex assay can boost testing efficiency, save resources, and reduce costs while maintaining test sensitivity. A robust insilico analysis is useful to analyze the prevalence of positive cases in a certain region and supports the design of the best pooling strategy to be used in this population. In populations with low prevalence of positive cases, this approach could help to implement mass testing and detect the transmission of SARS-CoV-2 in the community, supporting actions to control the spread of the virus.

Samples.
A total of 152 clinical samples were selected from the mass testing program of Industry Federation of the State of Rio de Janeiro (Brazil), from industry workers, between April and May 2020. The nasopharyngeal swabs were conditioned in 2.0 mL of DMEM medium (Thermo Fisher), and 1.5 mL of each sample were individually stored at − 80 ºC in cryotubes until further use. For the pooling validation, we used leftovers from routine testing samples, and no personal, clinical, and demographic data from individuals were accessed or released. The National Committee of Research Ethics (CAAE 36602620.6.0000.5257) reviewed and approved the present study. All methods and experimental protocols were approved and carried out in accordance with guidelines and regulations from the institutional committee.
To determine the sensitivity of different pool sizes (4,8, and 16 samples each), different amounts of the SARS-CoV-2 A2 isolate (100, 1000, and 10,000 viral infectious particles/mL) were pooled with nasopharyngeal swabs by SARS-CoV-2 qRT-PCR using targets different from the initial test. For the multiplex strategy to address combined multiplex and pooling sensitivity, we used several dilutions of viral RNA extracted from isolated viruses from tissue culture, and combined with pooling strategies.
Pooling assembly strategies. Here, it was performed a comparative analysis between the nasopharyngeal sample and RNA pooling approaches to identify the best method to construct the pools.

Statistical analysis.
Principal Component Analysis (PCA) was performed by using Unscrambler Software (version 10.1; CAMO AS; Trondheim, Norway). Grouping pattern was investigated within samples S23-S38, considering as variables the pool type (sample pooling or RNA pooling) and the pool size (4,8,16, or 32 samples).
In-silico experimental design using pooling methodology. Mathematical modeling of the pooling strategy was performed based on pool size and prevalence of positive cases in population. Each pooled test was modeled as a Bernoulli trial and with the probability of a positive test p based on the results obtained from the clinical nasopharyngeal swab samples collected from industrial workers. The standard case was no positive cases within a pooled test, given by: where P(neg) gives the probability (p) of a given pooled sample with size (n) having no positives based on the exponential distribution.
To estimate the cost savings related to a pooling strategy, the model below (2) (1) P neg = (1 − p) n (2) V v, p, n = v n + v 1 − P neg